Search CORE

144 research outputs found

A Motif-based Approach for Identifying Controversy

Author: Coletto Mauro
Garimella Kiran
Gionis Aristides
Lucchese Claudio
Publication venue
Publication date: 01/01/2017
Field of study

Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features

arXiv.org e-Print Archive

Archivio Ricerca Ca'Foscari

Aaltodoc Publication Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Relatedness Measures for Entity Linking

Author: Ceccarelli Diego
Lucchese Claudio
Orlando Salvatore
Perego Raffaele
Trani Salvatore
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

Archivio Ricerca Ca'Foscari

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

IMT Institutional Repository

On Suggesting Entities as Web Search Queries

Author: Ceccarelli Diego
Gordea Sergiu
Lucchese Claudio
Nardini Franco Maria
Perego Raffaele
Publication venue
Publication date: 01/01/2013
Field of study

The Web of Data is growing in popularity and dimension, and named entity exploitation is gaining importance in many research fields. In this paper, we explore the use of entities that can be extracted from a query log to enhance query recommendation. In particular, we extend a state-of-the-art recommendation algorithm to take into account the semantic information associated with submitted queries. Our novel method generates highly related and diversified suggestions that we as- sess by means of a new evaluation technique. The manually annotated dataset used for performance comparisons has been made available to the research community to favor the repeatability of experiments

IMT Institutional Repository

Discovering Europeana users’ search behavior

Author: Ceccarelli Diego
Gordea Sergiu
Lucchese Claudio
Nardini Franco Maria
Perego Raffaele
Tolomei Gabriele
Publication venue: ERCIM
Publication date: 01/01/2011
Field of study

Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users

Archivio della ricerca- Università di Roma La Sapienza

IMT Institutional Repository

EiFFFeL: Enforcing Fairness in Forests by Flipping Leaves

Author: Claudio Lucchese
Salvatore Orlando
Seyum Assefa Abebe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Nowadays Machine Learning (ML) techniques are extensively adopted in many socially sensitive systems, thus requiring to carefully study the fairness of the decisions taken by such systems. Many approaches have been proposed to address and to make sure there is no bias against individuals or specific groups which might originally come from biased training datasets or algorithm design. In this regard, we propose a fairness enforcing approach called EiFFFeL --Enforcing Fairness in Forests by Flipping Leaves-- which exploits tree-based or leaf-based post-processing strategies to relabel leaves of selected decision trees of a given forest. Experimental results show that our approach achieves a user-defined group fairness degree without losing a significant amount of accuracy

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios

Author: Calzavara Stefano
Lucchese Claudio
Marcuzzi Federico
Orlando Salvatore
Publication venue
Publication date: 07/04/2020
Field of study

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time. The attacker aims at finding a minimal perturbation of a test instance that changes the model outcome. We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We experimented the proposed strategy on decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently assess the minimal accuracy of a forest on a given dataset avoiding the costly computation of evasion attacks. Experimental evaluation on publicly available datasets shows that proposed strategy outperforms state-of-the-art adversarial learning algorithms against evasion attacks

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Feature partitioning for robust tree ensembles and their certification in adversarial scenarios

Author: Calzavara Stefano
Lucchese Claudio
Marcuzzi Federico
Orlando Salvatore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the brute-force approach would be unfeasible

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Extending the state-of-the-art of constraint-based pattern discovery, In:

Author: Claudio Lucchese
Francesco Bonchi
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2005
Field of study

Abstract The constraint-based pattern discovery paradigm was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In this paper we review and extend the state-of-the-art of the constraints that can be pushed in a frequent pattern computation. We introduce novel data reduction techniques which are able to exploit convertible anti-monotone constraints (e.g., constraints on average or median) as well as tougher constraints (e.g., constraints on variance or standard deviation). A thorough experimental study is performed and it confirms that our framework outperforms previous algorithms for convertible constraints, and exploit the tougher ones with the same effectiveness. Finally, we highlight that the main advantage of our approach, i.e., pushing constraints by means of data reduction in a level-wise framework, is that different properties of different constraints can be exploited all together, and the total benefit is always greater than the sum of the individual benefits. This consideration leads to the definition of a general Apriori-like algorithm which is able to exploit all possible kinds of constraints studied so far

CiteSeerX

The Impact of Negative Samples on Learning to Rank

Author: Claudio Lucchese
Franco Maria Nardini
Raffaele Perego
Salvatore Trani
Publication venue: CEUR-WS.org
Publication date: 01/01/2017
Field of study

Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to predict a score for each of the documents that is in turn exploited to induce a relevance ranking. .e e.ectiveness of these learned functions has been proved to be signi.cantly a.ected by the data used to learn them. Several analysis and document selection strategies have been proposed in the past to deal with this aspect. In this paper we review the state-of-the-art proposals and we report the results of a preliminary investigation of a new sampling strategy aimed at reducing the number of not relevant query-document pairs, so to signi.cantly decrease the training time of the learning algorithm and to increase the .nal e.ectiveness of the model by reducing noise and redundancy in the training set

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari